Block Prefetching for Numerical Codes

نویسندگان

  • Josef Weidendorfer
  • Carsten Trinitis
چکیده

Cache optimization is a crucial technique for most numerical code to exploit the performance of modern processors. It can be classified into improving access locality, and prefetching. Inherent algorithm constrains often limit the first approach which typically uses a blocking technique. While there exist automatic prefetching mechanism in hardware and/or compilers, they can not complement blocking with additional prefetching. We describe application controlled block prefetching, allowing to further improve on numerical code already optimized by blocking. It shows its benefits on both synthetic code and matrix multiplication by using the 2nd core of a dual-core processor as engine for executing block prefetching instructions.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Cache Optimizations for Iterative Numerical Codes Aware of Hardware Prefetching

Cache optimizations use code transformations to increase the locality of memory accesses and use prefetching techniques to hide latency. For best performance, hardware prefetching units of processors should be complemented with software prefetch instructions. A cache simulation enhanced with a hardware prefetcher is presented to run code for a 3D multigrid solver. Thus, cache misses not predict...

متن کامل

Off-loading application controlled data prefetching in numerical codes for multi-core processors

An important issue when designing numerical code in High Performance Computing is cache optimization in order to exploit the performance potential of a given target architecture. This includes techniques to improve memory access locality as well as prefetching. Inherent algorithm constrains often limit the first approach, which typically uses a blocking technique. While there exist automatic pr...

متن کامل

Data Prefetching and Data Forwarding in Shared Memory Multiprocessors

This paper studies and compares the use of data prefetching and an alternative mechanism, data forwarding, for reducing memory latency due to interprocessor communication in cache coherent, shared memory multiprocessors. Two multiprocessor prefetching algorithms are presented and compared. A simple blocked vector prefetching algorithm, considerably less complex than existing software pipelined ...

متن کامل

Integrating Fine-Grained Message Passing in Cache Coherent Shared Memory Multiprocessors

This paper considers the use of data prefetching and an alternative mechanism, data forwarding, for reducing memory latency caused by interprocessor communication in cache coherent, shared memory multiprocessors. Data prefetching is accomplished by using a multiprocessor software pipelined algorithm. Data forwarding is used to target interprocessor data communication, rather than synchronizatio...

متن کامل

The Efficacy of Software Prefetching and Locality Optimizations on Future Memory Systems

Software prefetching and locality optimizations are techniques for overcoming the speed gap between processor and memory. In this paper, we provide a comprehensive summary of current software prefetching and locality optimization techniques, and evaluate the impact of memory trends on the effectiveness of these techniques for three types of applications: regular scientific codes, irregular scie...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006